Goto

Collaborating Authors

 different algorithm




Causes and Effects of Unanticipated Numerical Deviations in Neural Network Inference Frameworks

Neural Information Processing Systems

Hardware-specific optimizations in machine learning (ML) frameworks can cause numerical deviations of inference results. Quite surprisingly, despite using a fixed trained model and fixed input data, inference results are not consistent across platforms, and sometimes not even deterministic on the same platform. We study the causes of these numerical deviations for convolutional neural networks (CNN) on realistic end-to-end inference pipelines and in isolated experiments. Results from 75 distinct platforms suggest that the main causes of deviations on CPUs are differences in SIMD use, and the selection of convolution algorithms at runtime on GPUs. We link the causes and propagation effects to properties of the ML model and evaluate potential mitigations. We make our research code publicly available.



Training Uncertainty

Neural Information Processing Systems

The first subset (in red) is utilized to evaluate a traditional accuracy-basedlossfunction `a,suchasthecrossentropy. This benchmark is based on a loss function designed to incentivize the trained model to produce the smallest possible conformal prediction sets with the desired coverage (e.g., 90% ifα = 0.1). The hybrid training procedure is similar to Algorithm 1, in the sense that it relies on analogous soft-sorting, soft-ranking, and soft-indexing algorithms toevaluate adifferentiable approximation Wi oftheconformity scoreWi in(8). Above, the second equality follows directly from the fact thatS(x,U;π,t), defined in (A2), is by construction increasing in t, and therefore Y / S(x,U;π,1 α) if and only if min{t [0,1]:Y S(x,U;π,t)}>1 α. The proof consists of showing that`a and`u are separately minimized by ˆπ = π,although only approximately inthelatter case.


A Extension to k-Means and (k, p)-Clustering

Neural Information Processing Systems

The lower bound on opt( U) given in Lemma B.10 holds for ρ -metric spaces with no modifications. By making the appropriate modifications to the proof of Theorem C.1, we can extend this theorem to In particular, we can obtain a proof of Theorem A.5 by taking the proof of Theorem C.1 and adding extra ρ factors whenever the triangle inequality is applied. We first prove Lemma B.1, which shows that the sizes of the sets U By Lemma B.2, we get that Henceforth, we fix some positive ξ and sufficiently large α such that Lemma B.3 holds. By now applying Lemma B.4 it follows that The following lemma is proven in [25]. Lemma B.1, the third inequality follows from Lemma B.7, and the fourth inequality follows from the The second inequality follows from Lemma B.8, the third inequality from averaging and the choice Proof of Lemma 3.3: It follows that with probability at least 1 e Hence, by Theorem D.1, we must have that O (poly( k)) query time must have Ω( k) amortized update time.


Fully Dynamic k-Clustering in O (k) Update Time

Neural Information Processing Systems

Clustering is a fundamental problem in unsupervised learning with several practical applications. In clustering, one is interested in partitioning elements into different groups (i.e.



SupplementaryMaterial

Neural Information Processing Systems

We provide additional results for EGTA applied to networked MARL system control for CPR management. Restraint percentages under different regeneration rates The heatmaps in Figure 7 (A-C) highlight the differences in restraint percentage for different values ofα as the regeneration rate is changed from high(0.1)to In the case where agents are completely self-interested (α = 0)shownin(A), themajority ofalgorithms without communication display verylowlevels of restraint for all rates of regeneration. The orange ovals in these diagrams indicate which system configurations correspond to the highest expected payofffor all agents. Schelling diagrams using a different parameterisation An alternative parameterisation for a Schelling diagram is to plot payoffs for a particular agent (cooperating or defecting) with respect to the number ofother cooperators on thex-axis, instead of thetotalnumber of cooperators.